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Abstract 

This paper presents TDC, a typed feature-based repre- 
sentation language and inference system. Type defini- 
tions in TDC consist of type and feature constraints over 
the boolean connectives. TDC supports open- and closed- 
world reasoning over types and allows for partitions and 
incompatible types. Working with partially as well as 
with fully expanded types is possible. Efficient reasoning 
in TDC is accomplished through specialized modules. 
Topical Paper. Topic Area: software for NLP, gram- 
mar formalism for typed feature structures. 



1 Introduction 

Over the last few years, constraint-based grammar 
formalisms have become the predominant paradigm 
in natural language processing and computational 
linguistics. Their success stems from the fact that 
they can be seen as a monotonic, high-level represen- 
tation language for linguistic knowledge which can be 
given a precise mathematical semantics. The main 
idea of representing as much linguistic knowledge as 
possible through a unique data type called feature 
structure, allows the integration of different descrip- 
tion levels without taking care of interface problems. 
While the first approaches relied on annotated phrase 
structure rules (e.g., PATR-II), modern formalisms 
try to specify grammatical knowledge as well as lexi- 
con entries entirely through feature structures. In or- 
der to achieve this goal, one must enrich the expres- 
sive power of the first unification-based formalisms 
with different forms of disjunctive descriptions. Lat- 
er, other operations came into play, e.g., (classical) 
negation. Other proposals consider the integration of 
functional/relational dependencies into the formalism 
which make them in general Turing-complete (e.g., 
ALE [4]). However the most important extension to 
formalisms consists of the incorporation of types, for 
instance in modern systems like TFS [15], CUF [6], 
or TDC [7]. Types are ordered hierarchically as it is 
known from object-oriented programming languages. 
This leads to multiple inheritance in the description 
of linguistic entities. Finally, recursive types are nec- 
essary to describe at least phrase-structure recursion 
which is inherent in all grammar formalisms which 
are not provided with a context-free backbone. 

In the next section, we argue for the need and rel- 
evance of using types in CL and NLP. After that, we 
give an overview of TDC and its specialized inference 
modules. Especially, we have a closer look on the 
novel features of TDC and present the techniques we 
have employed in implementing TDC. 



2 Motivation 

Modern typed unification-based grammar formalisms 
differ from early untyped systems in that they high- 
light the notion of a feature type. Types can be ar- 
ranged hierarchically, where a subtype inherits mono- 
tonically all the information from its supertypes and 
unification plays the role of the primary information- 
combining operation. A type definition can be seen as 
an abbreviation for a complex expression, consisting 
of type constraints (concerning the sub-/supertype 
relationship) and feature constraints (stating the ap- 
propriate attributes and their values) over the con- 
nectives A, V, and -1. Types serve as abbreviations 
for lexicon entries, ID rule schemata, and universal 
as well as language-specific principles as is familiar 
from HPSG. Besides using types as an abbreviation- 
al means as templates are, there are other advantages 
as well which cannot be accomplished by templates: 

• STRUCTURING KNOWLEDGE 

Types together with the possibility to order 
them hierarchically allow for a modular and 
clean way to represent linguistic knowledge ad- 
equately. Moreover, generalizations can be put 
at the appropriate levels of representation. 

• EFFICIENT PROCESSING 

Certain type constraints can be compiled into ef- 
ficient representations like bit vectors [l], where 
a GLB (greatest lower bound), LUB (least upper 
bound), or a ■< (type subsumption) computation 
reduces to low-level bit manipulation; see Section 
3.2. Moreover, types release untyped unification 
from expensive computation through the possi- 
bility to declare them incompatible. In addition, 
working with type names only or with partially 
expanded types minimizes the costs of copying 
structures during processing. This can only be 
accomplished if the system makes a mechanism 
for type expansion available; see Section 3.4. 

• TYPE CHECKING 

Type definitions allow a grammarian to declare 
which attributes are appropriate for a given type 
and which types are appropriate for a given at- 
tribute, therefore disallowing one to write incon- 
sistent feature structures. Again, type expansion 
is necessary to determine the global consistency 
of a given description. 

• RECURSIVE TYPES 

Recursive types give a grammar writer the op- 
portunity to formulate certain functions or re- 
lations as recursive type specifications. Work- 
ing in the type deduction paradigm enforces a 
grammar writer to replace the context-free back- 
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bone through recursive types. Here, parameter- 
ized delayed type expansion is the ticket to the 
world of controlled linguistic deduction [13]; see 
Section 3.4. 

3 TDC 

TDC is a unification-based grammar development en- 
vironment and run time system supporting HPSG- 
like grammars. Work on TDC has started within the 
DISCO project of the DFKI [14] (this volume). The 
DISCO grammar currently consists of approx. 900 
type specifications written in TDC and is the largest 
HPSG grammar for German [9]. The core engine of 
DISCO consists of TDC and the feature constraint 
solver UDihfe [3]. UDihfe itself is a powerful untyped 
unification machinery which allows the use of dis- 
tributed disjunctions, general negation, and function- 
al dependencies. The modules communicate through 
an interface, and this connection mirrors exactly the 
way an abstract typed unification algorithm works: 
two typed feature structures can only be unified if 
the attached types are definitely compatible. This 
is accomplished by the unifier in that UDihfe handles 
over two typed feature structures to TDC which gives 
back a simplified form (plus additional information; 
see Fig. 1). The motivation for separating type and 
feature constraints and processing them in special- 
ized modules (which again might consist of special- 
ized components as is the case in TDC) is twofold: (i) 
this strategy reduces the complexity of the whole sys- 
tem, thus making the architecture clear, and (ii) leads 
to a higher performance of the whole system because 
every module is designed to cover only a specialized 
task. 

3.1 TDC Language 

TDC supports type definitions consisting of type con- 
straints and feature constraints over the operators 
A, V, -i, and © (xor). The operators are general- 
ized in that they can connect feature descriptions, 
coreference tags (logical variables) as well as types. 
TDC distinguishes between avm types (open-world se- 
mantics), sort types (closed- world semantics), built-in 
types (being made available by the underlying Com- 
mon Lisp system), and atoms. Recursive types are 
explicitly allowed and handled by a sophisticated lazy 
type expansion mechanism. 

In asking for the greatest lower bound of two avm 
types a and b which share no common subtype, TDC 
always returns a Ab (open- world reasoning) , and not 
_L. The reason for assuming this is manifold: (i) par- 
tiality of our linguistic knowledge, (ii) approach is 
in harmony with terminological (KL-ONE-like) lan- 
guages which share a similar semantics, (iii) impor- 
tant during incremental grammar/lexicon construc- 
tion (which has been shown useful in our project), 
and (iv) one must not write superfluous type defini- 
tions to guarantee successful type unifications during 
processing. 

The opposite case holds for the GLB of sort types 
(closed- world approach). Furthermore, sort types dif- 
fer in another point from avm types in that they are 
not further structured, as is the case for atoms. More- 
over, TDC offers the possibility to declare partitions , 



a feature heavily used in HPSG. In addition, one can 
declare sets of types as incompatible, meaning that 
the conjunction of them yields _L, so that specific avm 
types can be closed. 

TDC allows a grammarian to define and use param- 
eterized templates (macros). There exists a special 
instance definition facility to ease the writing of lex- 
icon entries which differ from normal types in that 
they are not entered into the type hierarchy. Input 
given to TDC is parsed by a Zebu-generated LALR(l) 
parser [8] to allow for an intuitive, high-level input 
syntax and to abstract from uninteresting details im- 
posed by the unifier and the underlying Lisp system. 

The kernel of TDC (and of most other monoton- 
ic systems) can be given a set-theoretical semantics 
along the lines of [12]. It is easy to translate TDC 
statements into denotation-preserving expressions of 
Smolka's feature logic, thus viewing TDC only as syn- 
tactic sugar for a restricted (decidable) subset of first- 
order logic. Take for instance the following feature 
description <f> written as an attribute-value matrix: 



np 

AGR QFJ 
SUB J QT 



agreement 
NUM sg 
PERS 3rd 



It is not hard to rewrite this two-dimensional de- 
scription to a flat first-order formula, where at- 
tributes/features (e.g., AGR) are interpreted as binary 
relations and types (e.g., np) as unary predicates: 

3x . np((f)) A AGR(c/>, x) A agreement (x) A 
NUM (a;, sg) A PERS(;c, 3rd) A SUBJ((^, x) 

The corresponding TDC type definition of <f> looks as 
follows (actually & is used on the keyboard instead 
of A, | instead of V, "instead of -i): 

<f> ■— np A [AGR #x A agreement A [NUM sg, PERS 3rd], 
SVBJ #x]. 

3.2 Type Hierarchy 

The type hierarchy is either called directly by the 
control machinery of TDC during the definition of a 
type (type classification) or indirectly via the simpli- 
fier both at definition and at run time (type unifica- 
tion). 

3.2.1 Encoding Method 

The implementation of the type hierarchy is based 
on Ai't-Kaci's encoding technique for partial orders 
[l]. Every type t is assigned a code j(t) (represented 
via a bit vector) such that j(t) reflects the reflexive 
transitive closure of the subsumption relation with 
respect to t. Decoding a code c is realized either 
by a look-up (iff 3t . 7 _1 (c) = t) or by computing 
the "maximal restriction" of the set of types whose 
codes are less than c. Depending on the encoding 
method, the hierarchy occupies O(nlogn) (compact 
encoding) resp. 0(n 2 ) (transitive closure encoding) 
bits. Here, GLB/LUB operations directly correspond 
to bit-or/and instructions. GLB, LUB and ^ com- 
putations have the nice property that they can be 
carried out in this framework in O(n), where n is the 



Proceedings of COLING-94 



' — a L ' 



Query 



Result 




[mm: 



({c,aAb,L},{ji 



fail}) 



rvc 



avms -<=>■ GLBc (avmi , avm^) = awing 

avmi -<=>• afrnj = avni2 

_L GLB^ (avmi , aum^) = _L, via an 

explicit incompatibility declaration 
avmi A avni2 , otherwise (open world) 

avmi 2 -<=>■ expand(at)mi 2) F\fc2,i 7^ -L 
_L, otherwise 



sort 



3 



Figure 1: Interface between TVC and UDiNe. Depending on the type hierarchy and the type of Q] and [2|, 
TDC either returns c (c is definitely the GLB of a and b) or a Ab (open-world reasoning) resp. _L (closed-world 
reasoning) if there doesn't exist a single type which is equal to the GLB of a and b. In addition, TVC determines 
whether UDiAfe must carry out feature term unification (yes) or not (no), i.e., the return type contains all the 
information one needs to work on properly (fail signals a global unification failure). 

number of types. 1 

Ai't-Kaci's method has been extended in TDC to 
cover the open- world nature of avm types in that po- 
tential GLB/LUB candidates (calculated from their 
codes) must be verified. Why so? Take the following 
example to see why this is necessary: 

x := y A z 
x' := y' A z' A [a 1] 

During processing, one can definitely substitute yl\z 
through x, but rewriting y' A z' to x' is not correct, 
because x' differs from y' A z' — x' is more specific as 
a consequence of the feature constraint [a 1] . So we 
make a distinction between the "internal" greatest 
lower bound GLB^, concerning only the type sub- 
sumption relation by using Ai't-Kaci's method alone 
(which is however used for sort types) and the "ex- 
ternal" one, GLBc, which takes the subsumption re- 
lation over feature structures into account. 

With GLB^ and GLBc in mind, we can define a 
generalized GLB operation informally by the follow- 
ing table. This GLB operation is actually used during 
type unification (fc = feature constraint): 



sortg GLB^ (sorti , sort^) 
sorti sorti = S0H2 
_L, otherwise (closed world) 

atom 12 type-of( atom 1 2) d sor ^M> 
where sort2,i is a built-in 

_L, otherwise 

atomi atomi = atom2 
_L, otherwise 

T ^ f Cl n fc z + -L 
_L, otherwise 



GLB 


avm^ 


sorti 


atom^ 


fci 


avm2 


see 1. 


_L 


_L 


see 2. 


sort 2 


_L 


see 3. 


see 4. 


_L 


atom2 


_L 


see 4. 


see 5. 


_L 


}C2 


see 2. 


_L 


_L 


see 6. 



where 



1 Actually, one can choose in TDC between the two 
encoding techniques and between bit vectors and bignums 
in COMMON Lisp for the representation of the codes. In 
our LlSP implementation, operations on bignums are a 
magnitude faster than on bit vectors. 



The encoding algorithm is also extended towards 
the redefinition of types and the use of undefined 
types, an essential part of an incremental gram- 
mar/lexicon development system. Redefining a type 
means not only to make changes local to this type. 
Instead, one has to redefine all dependents of this 
type — all subtypes in case of a conjunctive type def- 
inition and all disjunction alternatives for a disjunc- 
tive type specification plus, in both cases, all types 
which use these types in their definition. The depen- 
dent types of a type t can be characterized graph- 
theoretically via the strongly connected component 
oft with respect to the dependency relation. 

3.2.2 Decomposing Type Definitions 

Conjunctive, e.g., x := y A z and disjunctive type 
specifications, e.g., x' := y' V z' are entered differ- 
ently into the hierarchy: x inherits from its super- 
types y and z, whereas x' defines itself through its 
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Figure 3: Decomposing a := &©c, such that a inherits 
from the intermediates |&Vc| and | — i6V — ic| . 



Figure 2: The intermediate types \uAv\ and \uAvAw\ 
are introduced by TVC during the type definitions 
x := u A v A [a 0] and y := w A v A u A [a I]. 

alternatives y' and z' . This distinction is represent- 
ed through the use of different kinds of edges in the 
type graph (bold edges denote disjunction elements; 
see Fig. 3). But it is worth noting that both of them 
express subsumption {x < y and x' y y') and that 
the GLB/LUB operations must work properly over 
"conjunctive" as well as "disjunctive" subsumption 
links. 

TVC decomposes complex definitions consisting of 
A, V, and -i by introducing intermediate types, so 
that the resulting expression is either a pure conjunc- 
tion or a disjunction of type symbols. Intermediate 
type names are enclosed in vertical bars (cf. the in- 
termediate types \u A v\ and \u A v A w\ in Fig. 2). 

The same technique is applied when using © (see 
Fig. 3). © will be decomposed into A, V and -i, plus 
additional intermediates. For each negated type ->t, 
TVC introduces a new intermediate type symbol 
having the definition ->t and declares it incompatible 
with t (see Section 3.2.3). In addition, if t is not 
already present, TVC will add t as a new type to the 
hierarchy (see types |-i6| and | — ■ c | in Fig. 3). 

Let's consider the example a := b © c. The de- 
composition can be stated informally by the follow- 
ing rewrite steps (assuming that the user has chosen 
CNF): 

a :=b® c 



_L 

b :: 



: a A b. 
bi V&2- 



BP 



a A b\ = _L and a A &2 = _L 



(6 A -.c) V (-.6 A c) 



a := (b V -.6) A (b V c) A (-.6 V -.c) A (-.c V c) 

a := (6Vc) A (-.6 V -.c) 

a := |6Vc| A |-.6V-.c| 

3.2.3 Incompatible Types and Bottom 
Propagation 

Incompatible types lead to the introduction of spe- 
cialized bottom symbols (see Fig. 3 and 4) which how- 
ever are identified in the underlying logic in that they 
denote the empty set. These bottom symbols must be 
propagated downwards by a mechanism called bottom 
propagation which takes place at definition time (see 
Fig. 4). Note that it is important to take not only 
subtypes of incompatible types into account but also 
disjunction elements as the following example shows: 



One might expect that incompatibility statements 
together with feature term unification no longer lead 
to a monotonic, set-theoretical semantics. But this 
is not the case. To preserve monotonicity, one must 
assume a 2-level interpretation of typed feature struc- 
tures, where feature constraints and type constraints 
might denote different sets of objects and the glob- 
al interpretation is determined by the intersection of 
the two sets. Take for instance the type definitions 
A := [a 1] and B := [6 1], plus the user declaration 
_L = A A B, meaning that A and B are incompatible. 
Then A A B will simplify to _L although the corre- 
sponding feature structures of A and B successfully 
unify to [a 1, & 1], thus the global interpretation is _L. 



3.3 Symbolic Simplifier 

The simplifier operates on arbitrary TVC expressions. 
Simplification is done at definition time and at run 
time when typed unification takes place (cf. Fig. 1). 
The main issue of symbolic simplification is to avoid 
(i) unnecessary feature constraint unification and (ii) 
queries to the type hierarchy by simply applying 
"syntactic" reduction rules. Consider an expression 
like x\ A ■ ■ ■ Axi ■ ■ ■ A ->Xi . . . A x n . The simplifier will 
detect _L by simply applying reduction rules. 

The simplification schemata are well known from 
the propositional calculus. They are hard-wired in 
the implementation to speed up computation. For- 
mally, type simplification in TVC can be character- 
ized as a term rewriting system. A set of reduction 
rules is applied until a normal form is reached. Con- 
fluence and termination is guaranteed by imposing 
a total generalized lexicographic order on terms (see 
below). In addition, this order has the nice effects 
of neglecting commutativity (which is expensive and 
might lead to termination problems): there is only 
one representative for a given formula. Therefore, 
memoization is cheap and is employed in TVC to 
reuse precomputed results of simplified expressions 
(one must not cover all permutations of a formula). 
Additional reduction rules are applied at run time 
using "semantic" information of the type hierarchy 
(GLB, LUB, and ^). 
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_L = aA&A 




d := b A [p +]. 

e := b A[p —]. 

c . a 

bottom propagation 



{a,6,c} 




-{a,6,c} 



Figure 4: Bottom propagation triggered through the subtypes d and e of b, so that a A d A c as weii as a A e A c 
wi/i simplify to _L during processing. 



3.3.1 Normal Form 

In order to reduce an arbitrary type expression to 
a simpler expression, simplification rules must be ap- 
plied. So we have to define what it means for an 
expression to be "simple" . One can either choose the 
conjunctive or disjunctive normal form. The advan- 
tages of CNF/DNF are: 

• UNIQUENESS 

Type expressions in normal form are unique 
modulo commutativity. Sorting type expressions 
according to a total lexicographic order will lead 
to a total uniqueness of type expressions (see 
Section 3.3.3). 

• LINEARITY 

Type expressions in normal form are linear. Ar- 
bitrary nested expressions can be transformed 
into flat expressions. This may reduce the com- 
plexity of later simplifications, e.g., at run time. 

• COMPARABILITY 

This property is a consequence of the two other 
properties. Unique and linear expressions make 
it easy to find or to compare (sub)expressions. 
This is important for the memoization technique 
described in Section 3.3.4. 

3.3.2 Reduction Rules 

In order to reach a normal form, it would suffice 
to apply only the schemata for double negation, dis- 
tributivity, and De Morgan's laws. However, in the 
worst case, these three rules would blow up the length 
of the normal form to exponential size (compared 
with the number of literals in the original expres- 
sion). To avoid this, other rules are used intermedi- 
ately: idempotence, identity, absorption, etc. If they 
can be applied, they always reduce the length of the 
expressions. Especially at run time, but also at def- 
inition time, it is useful to exploit information from 
the type hierarchy. Further simplifications are possi- 
ble by asking for the GLB, LUB, and ^. 

3.3.3 Lexicographic Order 

To avoid the application of the commutativity rule, 
we introduce a total lexicographic order on type ex- 
pressions. Together with DNF/CNF, we obtain a 
unique sorted normal form for an arbitrary type ex- 
pression. This guarantees fast comparability. 



We define the order <nf on n-ary normal forms: 
type < nf negated type < nf conjunction < nf dis- 
junction <mf symbol <mf string <mf number. For 
the comparison of atoms, strings, and type names, 
we use the lexicographical order on strings and for 
numbers the ordering < on natural numbers. 

Example: a <nf b <nf bb <nf -id <nf a Ab <nf 
a A -i a <nf a V & <nf a V & V c <nf a V 1 

3.3.4 Memoization 

The memoization technique described in [10] has 
been adapted in order to reuse precomputed results of 
type simplification. The lexicographically sorted nor- 
mal form guarantees fast access to precomputed type 
simplifications. Memoization results are also used by 
the recursive simplification algorithm to exploit pre- 
computed results for subexpressions. 

Some empirical results show the usefulness of mem- 
oization. The current DISCO grammar for Ger- 
man consists of 885 types and 27 templates. Af- 
ter a full type expansion of a toy lexicon of 244 in- 
stances/entries, the memoization table contains ap- 
prox. 3000 entries (literals are not memoized). 18000 
results have been reused at least once (some up to 
600 times) of which 90 % are proper simplifications 
(i.e., the simplified formulae are really shorter than 
the unsimplified ones). 

3.4 Type Expansion and Control 

We noted earlier that types allow us to refer to com- 
plex constraints through the use of symbol names. 
Reconstructing the constraints which determine a 
type (represented as a feature structure) requires a 
complex operation called type expansion. This is 
comparable to Carpenter's totally well-typedness [5]. 

3.4.1 Motivation 

In TDC, the motivation for type expansion is man- 
ifold: ' 

• CONSISTENCY 

At definition time, type expansion determines 
whether the set of type definitions (grammar and 
lexicon) is consistent. At run time, type expan- 
sion is involved in checking the satisfiability of 
the unification of two partially expanded typed 
feature structures, e.g., during parsing. 
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• ECONOMY 

From the standpoint of efficiency, it does make 
sense to work only with small, partially expand- 
ed structures (if possible) to speed up feature 
term unification and to reduce the amount of 
copying. At the end of processing however, one 
has to make the result /constraints explicit. 

• RECURSION 

Recursive types are inherently present in modern 
constraint-based grammar theories like HPSG 
which are not provided with a context-free back- 
bone. Moreover, if the formalism does not al- 
low functional or relational constraints, one must 
specify certain functions/relations like append 
through recursive types. Take for instance Ai't- 
Kaci's version of the append type which can be 
stated in TDC as follows: 

append := appendg V append^ . 
appendo := [FRONT < >, 

BACK #1 A list, 
WHOLE #1]. 
append] := [FRONT < #first . #restl >, 
BACK #back A list, 
WHOLE < # first. #rest2 >, 
PATCH append A [FRONT #restl , 
BACK #back, 
WHOLE #rest2]\. 

• TYPE DEDUCTION 

Parsing and generation can be seen in the light of 
type deduction as a uniform process, where ideal- 
ly only the phonology (for parsing) or the seman- 
tics (for generation) must be given. Type expan- 
sion together with a sufficiently specified gram- 
mar then is responsible in both cases for con- 
structing a fully specified feature structure which 
is maximal informative and compatible with the 
input. However, [15] has shown that type ex- 
pansion without sophisticated control strategies 
is in many cases inefficient and moreover does 
not guarantee termination. 

3.4.2 Controlled Type Expansion 

Uszkoreit [13] introduced a new strategy for lin- 
guistic processing called controlled linguistic deduc- 
tion. His approach permits the specification of lin- 
guistic performance models without giving up the 
declarative basis of linguistic competence, especial- 
ly monotonicity and completeness. The evaluation of 
both conjunctive and disjunctive constraints can be 
controlled in this framework. For conjunctive con- 
straints, the one with the highest failure probability 
should be evaluated first. For disjunctive ones, a suc- 
cess measure is used instead: the alternative with the 
highest success probability is used until a unification 
fails, in which case one has to backtrack to the next 
best alternative. 

TDC and IfDiNe support this strategy in that ev- 
ery feature structure can be associated with its suc- 
cess/failure potential such that type expansion can be 
sensitive to these settings. Moreover, one can make 
other decisions as well during type expansion: 

• only regard structures which are subsumed by a 
given type resp. the opposite case (e.g., expand 
the type subcat-ltst always or never expand the 
type daughters) 



• take into account only structures under cer- 
tain paths or again assume the opposite case 
(e.g., always expand the value under path 
SYNSEMlLOClCAT; in addition, it is possible to 
employ path patterns in the sense of pattern 
matching) 

• set the depth of type expansion for a given type 

Note that we are not restricted to apply only one 
of these settings — they can be used in combination 
and can be changed dynamically during processing. 
It does make sense, for instance, to expand at cer- 
tain well-defined points during parsing the (partial) 
information obtained so far. If this will not result in a 
failure, one can throw away (resp. store) this fully ex- 
panded feature structure, working on with the older 
(and smaller) one. However, if the information is in- 
consistent, we must backtrack to older stages in com- 
putation. Going this way which of course assumes 
heuristic knowledge (language as well as grammar- 
specific knowledge) results in faster processing and 
copying. Moreover, the inference engine must be able 
to handle possibly inconsistent knowledge, e.g., in 
case of a chart parser to allow for a third kind of 
edge (besides active and passive ones). 

3.4.3 Recursive Types, Implementational 
Issues, and Undecidability 

The set of all recursive types of a given gram- 
mar/lexicon can be precompiled by employing the 
dependency graph of this type system. This graph 
is updated every time a new type definition is added 
to the system. Thus detecting whether a given type 
is recursive or not reduces to a simple table look-up. 
However the expansion of a recursive type itself is a 
little bit harder. In TDC, we are using a lazy expan- 
sion technique which only makes those constraints 
explicit which are really new. To put it in anoth- 
er way: if no (global or local) control information 
is specified to guide a specific expansion, a recursive 
type will be be expanded under all its paths (local 
plus inherited paths) until one reaches a point where 
the information is already given in a prefix path. We 
call such an expanded structure a resolved typed fea- 
ture structure. Of course, there are infinitely many 
resolved feature structures, but this structure is the 
most general resolved one. 

Take for instance the append example from the 
previous section, append is of course a recursive 
type because one of its alternatives, viz., append 1 
uses append under the PATCH attribute. Expand- 
ing append with no additional information sup- 
plied (especially no path leading inside append 1 , 
e.g., PATCH I PATCH I PATCH) yields a disjunctive feature 
structure where both append and append 1 are sub- 
stituted by their definition. The expansion then stops 
if no other information enforce a further expansion. 

In practice, one has to keep track of the visited 
paths and visited typed feature structures to avoid 
unnecessary expansion. To make expansion more ef- 
ficient, we mark structures whether they are fully ex- 
panded or not. A feature structure is then fully ex- 
panded iff all of its substructures are fully expanded. 
This simple idea leads to a massive reduction of the 
search space when dealing with incremental expan- 
sion (e.g., during parsing). 
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It is worth noting that the satisfiability of fea- 
ture descriptions admitting recursive type equa- 
tions/definitions is in general undecidable. Rounds 
and Manaster-Ramer [ll] were the first having shown 
that a Kasper-Rounds logic enriched with recursive 
types allows one to encode a Turing machine. Be- 
cause our logic is much more richer, we immediately 
get the same result for TT>C. 

However, one can choose in TT>C between a com- 
plete expansion algorithm which may not terminate 
and a non-complete one to guarantee termination (see 
[2] and [5, Ch. 15] for similar proposals). The latter 
case heavily depends on the notion of resolvedness 
(see above). In both cases, the depth of the search 
space can be restricted by specifying a maximal path 
length. 

4 Comparison with other Systems 

TVC is unique in that it implements many novel fea- 
tures not found in other systems like ALE [4], LIFE 
[2], or TFS [15]. Of course, these systems provide 
other features which are not present in our formal- 
ism. What makes TT>C unique in comparison to them 
is the distinction open vs. closed world, the availabil- 
ity of the full boolean connectives and distributed 
disjunctions (via lIDiNe), as well as an implemented 
lazy type expansion mechanism for recursive types 
(as compared with LIFE). ALE, for instance, neither 
allows disjunctive nor recursive types and enforces 
the type hierarchy to be a BCPO. However, it makes 
recursion available through definite relations and in- 
corporates special mechanisms for empty categories 
and lexical rules. TFS comes up with a closed world, 
the unavailability of negative information (only im- 
plicitly present) and only a poor form of disjunctive 
information but performs parsing and generation en- 
tirely through type deduction (in fact, it was the first 
system). LIFE comes closest to us but provides a se- 
mantics for types that is similar to TFS. Moreover 
the lack of negative information and distributed dis- 
junctions makes it again comparable with TFS. LIFE 
as a whole can be seen as an extension of Prolog (as 
was the case for its predecessor LOGIN), where first- 
order terms are replaced by ^-terms. In this sense, 
LIFE is richer than our fomalism in that it offers a 
full relational calculus. 

5 Summary and Outlook 

In this paper, we have presented TT>C, a typed fea- 
ture formalism that integrates a powerful feature con- 
straint solver and type system. Both of them provide 
the boolean connectives A, V, and -1, where a com- 
plex expression is decomposed by employing interme- 
diate types. Moreover, recursive types are supported 
as well. In TT>C, a grammar writer decides whether 
types live in an open or a closed world. This ef- 
fects GLB and LUB computations. The type system 
itself consists of several inference components, each 
designed to cover efficiently a specific task: (i) a bit 
vector encoding of the hierarchy, (ii) a fast symbolic 
simplifier for complex type expressions, (iii) memo- 
ization to cache precomputed results, and (iv) a so- 
phisticated type expansion mechanism. The system 



as described in this paper has been implemented in 
Common Lisp and integrated in the DISCO environ- 
ment [14]. 

The next major version of TT>C will be integrat- 
ed into a declarative specification language which al- 
lows linguists to define control knowledge that can be 
used during processing. In addition, certain forms of 
knowledge compilation will be made available in fu- 
ture versions of TRC, e.g., the automatic detection of 
syntactic incompatibilities between types, so that a 
type computation can substitute an extensive feature 
term unification. 
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